On Some Optimization Heuristics for Lesk-Like WSD Algorithms
نویسندگان
چکیده
For most English words, dictionaries give various senses: e.g., “bank” can stand for a financial institution, shore, set, etc. Automatic selection of the sense intended in a given text has crucial importance in many applications of text processing, such as information retrieval or machine translation: e.g., “(my account in the) bank” is to be translated into Spanish as “(mi cuenta en el) banco” whereas “(on the) bank (of the lake)” as “(en la) orilla (del lago).” To choose the optimal combination of the intended senses of all words, Lesk suggested to consider the global coherence of the text, i.e., which we mean the average relatedness between the chosen senses for all words in the text. Due to high dimensionality of the search space, heuristics are to be used to find a near-optimal configuration. In this paper, we discuss several such heuristics that differ in terms of complexity and quality of the results. In particular, we introduce a dimensionality reduction algorithm that reduces the complexity of computationally expensive approaches such as genetic algorithms.
منابع مشابه
A Comparative Evaluation of Word Sense Disambiguation Algorithms for German
The present paper explores a wide range of word sense disambiguation (WSD) algorithms for German. These WSD algorithms are based on a suite of semantic relatedness measures, including path-based, information-content-based, and gloss-based methods. Since the individual algorithms produce diverse results in terms of precision and thus complement each other well in terms of coverage, a set of comb...
متن کاملUtilizing corpus statistics for hindi word sense disambiguation
Word Sense Disambiguation (WSD) is the task of computational assignment of correct sense of a polysemous word in a given context. This paper compares three WSD algorithms for Hindi WSD based on corpus statistics. The first algorithm, called corpus-based Lesk, uses sense definitions and a sense tagged training corpus to learn weights of Content Words (CWs). These weights are used in the disambig...
متن کاملComparing Similarity Measures for Original WSD Lesk Algorithm
There are many similarity measures to determine the similarity relatedness between two words. Measures of similarity or relatedness are used in such applications as word sense disambiguation. One of the methods used to resolve WSD is the Lesk algorithm. The performance of this algorithm is connected with the similarity relatedness between all words in the text, i.e the success rate of WSD shoul...
متن کاملSlangNet: A WordNet like resource for English Slang
We present a WordNet like structured resource for slang words and neologisms on the internet. The dynamism of language is often an indication that current language technology tools trained on today’s data, may not be able to process the language in the future. Our resource could be (1) used to augment the WordNet, (2) used in several Natural Language Processing (NLP) applications which make use...
متن کاملOPTIMIZATION OF AN OFFSHORE JACKET-TYPE STRUCTURE USING META-HEURISTIC ALGORITHMS
Offshore jacket-type towers are steel structures designed and constructed in marine environments for various purposes such as oil exploration and exploitation units, oceanographic research, and undersea testing. In this paper a newly developed meta-heuristic algorithm, namely Cyclical Parthenogenesis Algorithm (CPA), is utilized for sizing optimization of a jacket-type offshore structure. The a...
متن کامل